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Second  Order  Expansion  of  t-statistic  in  Autoregressive  Models 

by  Anna  Mikusheva  ^ 
MIT,  Department  of  Economics 

Abstract 

The  purpose  of  this  paper  is  to  receive  a  second  order  expansion  of  the  t-statistic  in 
AR(1)  model  in  local  to  unity  asjnnptotic  approach.  I  show  that  Hansen's  (1998)  method  for 
confidence  set  construction  achieves  a  second  order  improvement  in  local  to  unity  asymptotic 
approach  compared  with  Stock's  (1991)  and  Andrews'  (1993)  methods. 

Key  Words:  autoregressive  process,  confidence  set,  local  to  unity  asymptotics,  uniform 
convergence 

1     Introduction 

The  paper  deals  with  inferences  about  the  persistence  parameter  (AR  coefficient)  p 
in  AR(1)  models.  The  classical  Wald  confidence  interval  typically  has  low  coverage  in 
finite  samples,  especially  if  the  true  value  of  p  is  close  to  unity  as  it  happens  for  most 
of  macroeconomic  time  series.  Wald  type  interval  is  based  on  classical  asymptotic 
theory  ,  that  is,  the  setup  when  \p\  <  1  is  considered  to  be  fixed  and  the  sample 
size  n  converges  to  infinity.  The  classical  asymptotic  laws  (CLT  and  Law  of  Large 
Numbers)  do  not  hold  uniformly  over  the  interval  p  €  (0, 1),  rather  the  convergence 
becomes  slower  as  p  approaches  1,  and  the  both  laws  do  not  hold  for  p  =  1.  An 
alternative  asymptotic  approach,  local  to  unity  asymptotics,  considers  sequences  of 
models  with  p„  =  1  +  c/n  as  n  goes  to  infinity.  According  to  Mikusheva  (2007)  and 
Andrews  and  Guggenberger  (2007a,b)  local  to  unity  asymptotics  leads  to  uniform 
inferences  on  p,  whereas  classical  asymptotics  does  not. 

There  are  at  least  three  methods  that  can  be  used  to  construct  asymptotically 
correct  confidence  set  for  p:  method  based  on  the  local  to  unity  asymptotic  approach 
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(Stock  (1991)),  parametric  grid  bootstrap  (Andrews  (1993))  and  non-parametric  grid 
bootstrap  (Hansen  (1999)).  The  validity  of  the  methods  was  proved  in  Mikusheva 
(2007). 

This  paper  compares  three  methods  on  a  ground  of  accuracy  of  asymptotic  ap- 
proximation they  provide.  All  three  methods  are  asymptotically  first  order  correct, 
that  is,  the  coverage  of  the  confidence  sets  uniformly  converges  to  the  confidence  level 
as  the  sample  size  increases.  The  question  I  address  is  the  speed  of  the  convergence.  It 
is  well  known  that  Hansen's  grid  bootstrap  achieves  a  second  order  refinement  in  clas- 
sical asymptotic  approach,  whereas  the  two  other  methods  (Andrews'  and  Stock's)  do 
not.  I  address  the  same  question  in  local  to  unity  asymptotic  approach.  My  answer  is 
that  the  non-parametric  grid  bootstrap  (Hansen's  method)  achieves  the  second  order 
refinement,  that  is,  the  speed  of  coverage  probability  convergence  is  o(n~^/^),  whereas 
the  other  two  methods  in  general  guarantee  only  Oiyn'^/"^)  speed  of  convergence  in  lo- 
cal to  unity  asymptotic  approach.  To  compare  the  three  method  I  find  an  asymptotic 
expansion  of  the  t-statistic  around  its  limit  in  local  to  unity  asymptotics. 

A  second-order  distributional  expansion  is  an  approximation  of  the  unknown  dis- 
tribution function  of  the  statistic  of  interest  (t-statistic  in  our  case)  by  some  other 
function  up  to  the  order  of  o(n~^/^).  One  example  of  a  second  order  distributional 
expansion  is  the  first  two  terms  of  well-known  Edgeworth  expansion. 

There  are  several  differences  between  the  expansion  obtained  in  this  paper  and 
Edgeworth  expansion.  First  of  all,  Edgeworth  expansion  is  an  expansion  around 
normal  distribution.  In  our  case  we  expand  the  t-statistic  around  its  local  to  unity 
asymptotic  limit,  which  is  a  non-normal  distribution.  Secondly,  it  is  known  that 
the  first  two  terms  of  Edgeworth  expansion  do  not  constitute  a  distribution  function 
themselves.  In  particular,  it  can  be  non-monotonic  and  not  changing  from  0  to  1.  One 
special  feature  of  my  expansion  is  that  it  approximates  the  distribution  function  of  the 
t-statistic  by  a  cumulative  distribution  function  (cdf),  that  can  be  easily  simulated. 

And  finally,  opposed  to  the  Edgeworth'  expansion  which  came  from  expanding 
characteristic  function,  my  expansion  comes  from  stochastic  embedding  and  strong 
approximation  principle.    The  same  idea  was  used  in  a  very  inspirational  work  by 


Park  (2003a).  He  obtained  a  second  order  expansion  of  the  Dickey- Fuller  t-statistic 
for  testing  a  unit  root.  The  expansion  I  obtain  is  "probabilistic"  one.  That  is,  I 
construct  a  random  variable  on  the  same  probability  space  as  the  t-statistic  in  such  a 
way  that  the  difference  between  the  constructed  variable  and  the  t-statistic  is  of  the 
order  o{n~^/'^)  in  probability.  I  also  show  that  under  additional  moment  assumptions 
it  leads  to  a  second  order  "distributional"  expansion. 

The  distributional  expansion  allows  me  to  show  that  Hansen's  grid  bootstrap 
achieves  the  second  order  improvement  in  local  to  unity  setting  compared  to  An- 
drews' method  and  the  local  to  unity  asymptotic  distribution.  The  intuition  for  non- 
parametric  grid  bootstrap  improvement  is  the  classical  one  -  Hansen's  grid  bootstrap 
uses  the  information  about  the  distribution  of  error  terms. 

The  paper  contributes  to  the  literature  on  bootstrapping  autoregressive  processes 
and  closes  the  discussion  on  making  inferences  on  persistence  in  AR(1)  model.  Here 
some  of  the  known  results  on  bootstrap  of  AR  models:  Bose(1988)  showed  in  classical 
asymptotics  that  the  usual  bootstrap  provides  the  second  order  improvement  com- 
pared to  the  OLS  asymptotic  distribution.  However,  Basawa  et  al(1991)  showed  the 
usual  bootstrap  fails  (has  asymptotically  wrong  size)  if  the  true  process  has  a  unit 
root.  Their  result  can  be  easily  generalized  to  local  to  unity  sequences.  Park  (2003b) 
showed  that  the  usual  bootstrap  achieves  higher  accuracy  than  the  asymptotic  nor- 
mal approximation  of  the  t-statistic  for  weakly  integrated  sequences  (for  sequences 
with  AR  coefficient  converging  to  the  unit  root  with  a  speed  slower  than  1/n).  The 
intuition  behind  Park's  result  is  that  the  ordinary  bootstrap  uses  the  information 
about  closeness  of  the  AR  coefficient  to  the  unit  root.  His  expansion  is  non-standard 
and  the  reason  of  bootstrap  improvement  is  also  not  usual  (usually  bootstrap  achieves 
higher  efficiency  due  to  usage  information  about  the  distribution  of  error  term). 

1  get  many  ideas  from  a  paper  by  Park  (2003a),  where  he  proves  the  second 
order  improvement  of  bootstrapped  tests  for  the  unit  root.  He  found  an  asymptotic 
expansion  of  t-statistic  for  the  unit  root  in  terms  of  functionals  of  Brownian  motion. 
My  expansions  for  local  to  unity  sequences  will  bear  the  similar  idea  to  his. 

The  rest  of  the  paper  is  organized  in  the  following  way.  Section  2  introduces  nota- 


tions.  Section  3  obtains  a  probabilistic  embedding  of  error  terms  and  a  probabilistic 
expansion  of  the  t-statistic.  Section  4  shows  that  the  probabilistic  expansion  from  the 
previous  section  leads  to  a  distributional  expansion.  Section  5  establishes  a  similar 
expansion  for  a  bootstrapped  statistic  and  obtains  the  main  result  of  the  essay.  All 
proofs  are  left  to  the  Appendix. 

2     Notations  and  preliminary  results 

Let  us  have  a  process 

y3  =  PVj-i  +  ^v     J  =  'i-,-,n  (1) 

We  assume  that  yo  =  0.  Error  terms  Sj  are  iid  with  mean  zero,  unit  variance  and  finite 
absolute  moment  of  order  r.  The  procedure  of  testing  and  constructing  confidence 
sets  is  based  on  the  t-statistics.  Let 

,,       s     E^iivj  -  pyj-i)yj-i 

t{y,p,n)^ 


^^JEU  yU 

be  the  t-statistic  for  testing  the  true  value  of  the  parameter  p  using  the  sample  {yj}"=i. 
The  classical  asymptotic  approach  states  that  for  every  fixed  \p\  <  1  as  n  increases 
to  infinity  we  have 

t{y,p,n)^N{0,l). 

According  to  local  to  unity  asymptotic  approach  if  /0„  =  1  +  c/n,  C  >  0 

J^  Jc{x)dw{x) 


t{y,Pn,n) 


ylo  Jc{x)d2 


where  Jc(x)  —  J^  e'^^^~^''dw{s)  is  an  Ornstein-  Ulenbeck  process,  w{-)  is  a  standard 
Brownian  motion. 

As  it  was  shown  in  Mikusheva(2007)  the  classical  asymptotic  approximation  is  not 
uniform.  In  particular,  if  Za  is  the  a-quantile  of  standard  normal  distribution,  then 

lim   inf  Pp{za/2  <  t{y,p,n)  <  zi-q/2}  <  1  -  a. 

n-»oo|p|<l 


As  a  result,  the  usual  OLS  confidence  set  would  have  a  poor  coverage  in  finite  samples 
if  we  allow  p  to  be  arbitrary  close  to  the  unit  root. 

A  local  to  unity  asymptotic  approach  on  the  contrary  is  uniform  (Mikusheva(2007), 
Theorem  2).  Namely, 

lim    sup  sup  \Pp{t{y,  p,  n)  <  x}  -  F^  Jx)\  =  0 
"^'^/9e[o,i]    ^ 


where  Fl^ix)  =  P{j;  Mt)dw{t)/yJ J^  J^{t)dt  <  x}  with  c  =  nlogip). 

The  use  of  local  to  unity  asymptotic  in  order  to  construct  a  confidence  set  was 
suggested  by  Stock  (1991).  It  can  be  implemented  as  a  "grid"  procedure.  One  need 
to  test  a  set  of  hypothesis  Hq  :  p  =  po  (in  practice  the  testing  could  be  performed 
over  a  fine  grid  of  values  of  po).  A  test  compares  t-statistic  t{y,po,n)  with  critical 
values  which  are  quantiles  of  the  distribution  of  F^p^{x).  The  acceptance  set  is  an 
asymptotic  confidence  set. 

Two  alternatives  to  the  procedure  above  are  Andrews'  parametric  grid  bootstrap 
and  Hansen's  non-parametric  grid  bootstrap.  The  method  differ  in  the  choice  of 
critical  values.  In  particular,  in  Andrews'  grid  bootstrap  critical  values  are  taken  as 
quantiles  of  finite  sample  distribution  of  the  t-statistic  in  a  model  with  normal  errors: 
F^p^{x)  =  Pp^{t{z, po,n)  <  x}.  Here  Zt  is  AR(1)  process  with  the  AR  coefficient  po 
and  normal  errors.  In  Hansen's  grid  bootstrap  we  use  quantiles  of  F*p{x)  the  finite 
sample  distribution  of  the  t-statistic  for  a  bootstrapped  model  with  the  null  imposed. 
More  accurately,  let  y^  —  Poyl-i  +  ^t ,  where  e^  are  sampled  from  the  residuals  of  the 
initial  OLS  regression,  then  F^p^{x)  —  Ppg{t{y*,po,n)  <  x}. 

Previously,  Mikusheva  (2007)  proved  that  all  three  methods  are  uniformly  asymp- 
totically correct.  My  goal  is  to  explore  the  second  order  properties  of  the  methods  in 
local  to  unity  asymptotic  approach.  I  will  show  that  Hansen's  bootstrap  provides  the 
second  order  improvement  in  local  to  unity  asymptotic  approach.  That  is,  I  consider 
a  sequence  of  models  p  =  pn  =  exp{c/n}  as  n  increases  to  infinity  (this  sequence  of 
models  is  called  "nearly  integrated"  process).  The  goal  is  to  obtain  the  second  order 
expansion  of  i(y,  p„,n)  along  this  sequence  of  models.  The  next  section  would  be 
devoted  to  probabilistic  expansion. 


3     Stochastic  embedding. 

Assumptions  A.  Assume  that  error  terms  £j  are  i.i.d.  with  mean  zero,  variance 
a^  —  1  and  £'|ej|''  <  oo  for  some  r  >  2. 

According  to  Skorokhod  embedding  scheme,  there  exists  a  Brownian  motion  w  and 
a  sequence  of  iid  variables  r^  on  an  extended  probabiUty  space  such  that  the  sequence 
of  error  terms  have  the  same  distribution  as  a  sequence  of  stopped  Brownian  motion: 

It  also  known  that  Etj  —  a^  =  1,E\tj\^^^  <  KrE\ej\'^,  where  K,.  is  an  absolute 
constant.  We  define  T„j  —  ^  ^11=1  "^i-  Let  us  consider  a  sequence  of  random  vectors 
^j  =  (a'^'^)  and  Bnit)  =  ^J2f^\v,  =  {wnit),Vnit),Un{t)).  Park(2003a) 
proved  that  B„  — >'^  B  =  {w,V,U),  where  i?  is  a  Brownian  motion  with  covariance 
matrix  S  given  by 

^  1  ^3/30-3  ^^/^3  > 

S=       i^^/3a^  K/a"  {fi4-3a^  +  3K)/12a*       ■  (3) 

^   ^i3/(T^     (//4  -  3^4  +  3K)/12a4  {f^^-a^)/a*  ^ 

Here  Ee]  =  a^  =  1,  Ee^^  =  //g,  Ee^^  =  IJ4,  E{tj  -  a^-f  =  k. 

Park(2003a)also  proved  that  5„  and  B  can  be  defined  on  the  same  probability 
space  in  such  a  way  that  B„  —^''■^■  B.  Let  N{t)  —  w{l  +  t)  —  w{l),  M{t)  be  aBrownian 
motion  independent  on  w. 

Theorem  1  Let  pn  =  1  +  cjn^c  <  0.  Assume  that  Sj  satisfy  set  of  assumptions  A 
with  r  >  8,  then  one  has  the  following  probabilistic  expansions: 

(a) 

-^  -  Jc(T„,,.)  =  -^  r  "  e'^C^/"-)  Je(5)dy(s)  +  o,{n-"^) 

iVy,-_i£fc=  /  Mx)dw{x)  +  n-'/'Ul)M{V)+ 
+1=1 -cj  j  e'^^'-'^Us)dV{s)dw{t)  +  Ul)N{V)  +  ]^M\V)  +  ]^u\+o,{^) 


\Y.yl=  I   Jl{^)dx-^  f  Mx)  re'^'-'^Jc{s)dV{s)dx- 
IT-  Jo  yi^  Jo  Jo 


s/n  Jq 


1     ,o 1 


Jl{x)dV{x)  +  -j=Jl{l)V  +  op(^) 


(d) 


'^  '  Jo  V^  Jo   Jo 

^  /  Mx)dV{x)  +  ^M1)V  +  Op{^) 


(e) 


t{y,  p„,  n)  =  t^  +  n-'/^f  +  n-^'^g  +  o^{n-^'^) 
here  t^  =  /J  Mx)dw{x)/^ J^  J^{x)dx,  f  =  J,{1)M{V)/^J J^  J^ix)dx, 
^  ^  Tites  (~^^o'  Ioe'=^'-^^Ms)dV{s)dw{t)  +  Jc{l)N{V)  +  '^MHV)  +  \U 
+t\/r^',,    (2c lo  Mx)  lo  e'^^^-^U,{s)dV{s)dx  +  J^  Ux)dV{x)  -  Ul)V 


The  expansions  from  Theorem  1  are  probabihstic.  Namely,  we  approximate  a  ran- 
dom variable  t{y,pn,n),  whose  distribution  is  unknown,  by  another  random  variable 
^n  (whose  distribution  is  known  or  could  be  simulated)  with  accuracy  o{n~^^'^)  in 
probability:  P{\£,n  —  t{y,pn,n)\  >  en~^^^}  -^  0.  Probabilistic  expansions  are  not  of 
interest  by  themselves  (since  they  are  abstract  constructions) ,  rather  they  are  building 
blocks  in  getting  distributional  expansions  described  in  the  next  section. 

The  random  variables  on  the  right-hand  side  are  functionals  of  several  Brownian 
motions  B{t)  =  {w{t),  V{t),  U{t))  and  M{t).  The  covariance  matrix  of  B{t)  depends 
only  on  some  characteristics  {a'^ ,  ^3 ,  ^4 ,  k)  of  the  distribution  function  of  Sj ,  namely  on 
the  first  four  moments  of  Sj  and  some  characterization  of  non-normality  k  (parameters 
are  defined  above).  M{t)  is  independent  of  B{t).  As  a  result  the  distribution  of  the 
approximating  variable  depends  only  on  ip  =  {a^,iJ.3,  ji^,  k,  c).  The  distribution  of  the 
approximating  variable  can  be  easily  simulated. 

Remark  1  //  one  has  an  exact  unit  root  (c  =  0),  then  the  expansion  is  exactly  equal 
to  the  expansion  obtained  by  Park(2003a). 


Remark  2  If  Ej  are  normally  distributed,  then  V{t)  =  0  and  w{-)  is  independent  of 
U(-).  It  implies  that  t  =  f^  +  -K=—j==^  +  Op(n~^/^),  where  U  is  independent  on 
w.  So,  according  to  this  probabilistic  expansion  Stock's  and  Andrews'  methods  are  the 
same  up  to  an  independent  summand  of  order  Op{n~^^^).  I  show  in  the  next  section 
that  they  are  the  same  distributionally  up  to  the  order  of  o{n~^/'^). 

4     Distributional  expansion 

For  making  inferences  we  need  asjrmptotic  theory  to  approximate  the  unknown  dis- 
tribution of  the  t-statistic  t{y,n,pn)-  In  the  previous  section  we  estabhshed  a  prob- 
abihstic  approximation.  In  particular,  we  found  a  sequence  of  random  variables  ^„ 
with  known  distribution  depending  on  a  vector  of  parameters  ip  (the  distribution  can 
be  simulated  if  V'  is  known)  such  that  t{y,n,pn)  =  ^n  +  Op{n~^^'^)  for  pn  —  1  +  c/n. 
That  is, 

lim  Pp„  <^  \t{y,  n,  p„)  -  ^„|  >  ^=  I  =  0   for  all   e  >  0. 

The  goal  of  this  section  is  to  get  a  distributional  expansion.  By  distributional 
expansion  of  the  second  order  I  mean  a  sequence  of  real- value  functions  G„(-)  such 
that 

P  {t{y,  n,  Pn)  <x}  =  G„(x)  +  o(n-i/2).  (4) 

In  general,  (?«(•)  is  not  required  to  be  a  cdf  of  any  random  variable. 

An  example  of  a  distributional  expansion  is  the  second  order  Edgeworth  expansion. 
Initially,  Edgeworth  expansion  was  stated  as  an  approximation  to  the  distribution  of 
normalized  sums  of  random  variables.  Nowadays,  Edgeworth  type  expansions  have 
been  obtained  for  many  statistics  having  normal  limiting  distribution.  Traditionally 
Edgeworth  type  expansions  are  obtained  from  expansions  of  characteristic  functions. 
It  is  also  known  that  usually,  in  Edgeworth  expansions  function  G„  is  not  a  cdf  of 
any  random  variable.  In  particular,  G„  is  not  monotonic  in  many  applications. 

In  our  setup  Edgeworth  expansion  does  not  exist  since  the  limiting  distribution 
is  not  normal.  In  this  section  I  show  that  under  some  moment  conditions  our  proba- 


bilistic  expansion  corresponds  to  a  distributional  expansion.  Namely, 
sup  |P,„  {t{y,n,pn)  <  x}  -  P{^„  <  x}|  =  o{T-^'^). 

X 

here  ^n  =  f^  +  n~^/^f  +  rT^/'^g  from  part(e)  of  Theorem  1.  That  is,  in  our  case 
Gn{x)  =  P  {^n  <  x}  is  a  cdf.  It  depends  on  a  parameter  vector  ip. 

Definition  1  (Park(2003a))  A  random  variable  X  has  a  distributional  order  o{T~°-) 

ifp{\x\  >T-^]  <r-" 

Theorem  2  Let  all  assumptions  of  Theorem  1  hold,  then  all  Op{T~^^'^)  terms  in  state- 
ments (a)-(e)  of  Theorem  1  are  of  distributional  order  o{T~^^'^). 

Corollary  1  //  error  terms  are  i.i.d.  with  mean  zero  and  8  finite  moments,  the 
following  distributional  expansion  holds: 

sup  \P{t{y, pn,n)<x}-  P{f  +  n-"^f  +  n-^'^g  <x}\=  o{n-^'^) 

X 

One  can  notice  there  is  no  "unique"  distributional  expansion  even  if  we  require 
that  Gn  is  a  cdf.  This  surprising  fact  is  explained  in  the  note  below. 

Remark  3  Let  Gn{x)  =  P{^n  <  x}  be  a  cdf  and  assume  that  rj  has  normal  distribu- 
tion and  is  independent  of  a-  algebra  A.  Let  i^„  and  F  be  measurable  with  respect  to  A. 
If  Gn  satisfies  the  distributional  approximation  (4),  then  Gn{x)  =  P{^n  +  FA^rj  <  x} 
would  also  satisfy  it.  That  is,  the  additional  term  (which  is  of  probabilistic  order 
of  Op{n~^/'^))  has  distributional  impact  of  order  o{n~^f'^).  This  point  was  made  by 
Park(2003a).  The  idea  is  that  the  characteristic  function  for  ^„  +  F-j^r)  conditional 
on  A  is  equal  to  e''^"  up  to  the  order  0{n~^). 

It  might  seem  strange  that  the  probabilistic  expansion  of  ^  yj-i£j  has  term  of 
order  Op{n~^^'^).  This  term  has  distributional  impact  of  order  0(n~^/^).  The  idea  of 
the  statement  is  totally  parallel  to  the  note  above.  Indeed,  M{V)  is  distributionally 
■^(1)  ■  VWl^  where  M(l)  ~  N{0, 1)  and  is  independent  of  B{-)  =  {w,  V,  U). 


Remark  4  Combining  Notes  2  and  3  one  get  the  following.  If  error  terms  are  nor- 
mally distributed  then  we  have  a  distributional  equivalence 

P{t{z,  n, p)  <x}  =  P{f  <x}  +  oin"^/'^). 

That  is  the  difference  between  quantiles  constructed  in  Stock's  and  Andrews'  methods 
is  of  the  order  o{n~^^'^).  The  two  methods  achieve  the  same  accuracy  up  to  the  second 
order. 

5     Bootstrapped  expansion 

5.1     Embedding  for  bootstrapped  statistic 

In  section  4  we  got  that  the  distribution  oft-statistic  t{y,  n,  p„)  could  be  approximated 
by  a  sequence  of  functions  Gn{x)  =  P{t'^+:^^f+  -y^g},  where  /  and  g  are  functionals 
of  Brownian  motions  B{-)  (covariance  structure  is  described  in  (3),  it  depends  on 
cr^,  fj.3,  fi4,  K,  c)  and  M  (independent  of  B). 

The  bootstrapped  statistic  has  totally  the  same  form,  since  it  uses  the  "true 
value"  (not  estimator)  of  p(or  c).  The  only  difference  between  the  initial  distribution  of 
t-statistic  and  the  grid  bootstrapped  distribution  of  t-statistic  is  different  distribution 
of  error  term. 

P{t{y*,n,p)<x}^G:{x)  +  o{n-'/^), 

where  G;(x)  =  Pji'^  +  Jjjf*  +  ^5*}  with  /*  and  g*  are  functionals  of  5*,M(the 
same  functionals,  covariance  structure  of  B*  depends  on  a'^\fL3,'p,4,K) 

The  next  subsection  states  that  the  parameter  vector  (a^,/23,/24,  k)  converge  al- 
most surely  to  (a^,/X3,/i4,  k)  at  a  speed  of  Op(n~^/^),  which  would  be  enough  to  say 
that  the  second  order  terms  in  expansions  of  initial  and  grid  bootstrapped  statistics 
coincide  up  to  the  order  of  o(n"^/^). 

Theorem  3  Let  us  have  an  AR(1)  process  (1)  with  yo  =  0  and  error  terms  satisfying 
Assumptions  A  with  r  >  1.    Assume  that  p„  =  1  +  c/n,c  <  0.    Let  us  consider  for 
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every  n  a  process  y*  —  PnVj-i  +  ^jil/o  —  0;  where  e*  are  i.i.d.   sample  from  centered 
and  normalized  residuals  from  the  initial  regression.   Then 

sup|P{i(y,n,/9„)  <  x}  -  P*{t{y*,n,pn)  <  x\y}\  =  o{n-^^'^)     a.s. 

X 

Theorem  3  states  that  Hansen's  grid  bootstrap  provides  the  second  order  improve- 
ment compared  with  Andrews'  and  Stock's  methods  in  local  to  unity  asymptotic  ap- 
proach. The  intuition  for  that  is  the  usual  one.  The  second  order  term  depends  on 
the  parameters  of  the  distribution  of  error  terms.  Those  parameters  are  well  approx- 
imated by  the  sampled  analogues.  The  non-parametric  grid  bootstrap  uses  sampled 
residuals  whose  parameters  are  very  close  to  the  population  values.  As  a  result,  the  re- 
finement is  achieved.  The  only  parameter  (on  which  the  limiting  expansion  depends) 
that  could  not  be  well  estimated  is  local  to  unity  parameter  c.  The  grid  bootstrap 
procedure  uses  the  "true"  value  of  c. 

Theorem  3  is  a  statement  obtained  in  local  to  unity  asymptotic  approach.  The 
statement  that  Hansen's  grid  bootstrap  achieves  a  second  order  refinement  in  the 
classical  asymptotics  is  an  easy  one.  It  could  be  obtained  from  Edgeworth  expansion 
along  the  lines  suggested  in  Bose  (1988).  As  a  result,  we  should  advise  applied 
researchers  to  chose  Hansen's  grid  bootstrap  over  Andrews'  and  Stock's  methods. 

5.2     Convergence  of  parameters 

This  subsection  is  a  part  of  the  proof  of  Theorem  3  from  the  previous  subsection.  Here 
we  show  that  the  parameter  vector  tp  =  (cr^,  ^3,  /i4,  k)  could  be  well  approximated  by 
a  sample  analog  (moments  of  residuals)  V'  =  (5'^,/^3,M4)^)• 
Lemma  1  Let  error  terms  Sj  satisfy  the  set  Assumptions  A.  Then  there  is  a  Sko- 
rokhod's  embedding  for  which 

The  convergence  of  the  third  and  forth  sample  moments  of  residuals  to  their 
population  analogues  with  a  speed  of  0(n~^/^)  is  the  usual  statement.  For  that  we 
need  to  require  enough  moments  of  error  term,  8  moments  should  be  enough. 
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One  parameter,  «,  as  was  discussed  above  is  not  intrinsic  (it  depends  on  a  way  the 
Skorokhod  embedding  was  realized).  The  fact  that  k,  — >p  Et^  with  speed  0{n''^^'^) 
is  non-trivial  mainly  because  most  of  known  constructions  are  not  explicit  and  the 
dependence  of  moments  of  r  on  distribution  of  e  is  not  evident.  By  messy  calculation 
I  got  that  in  the  initial  Skorokhod  construction  published  in  Skorokhod's  book(1965) 
Et"^  =  |£'^'*.  That  would  imply  the  speed  of  convergence  we  need. 

6     Appendix.  Proofs  of  results 

We  use  the  following  results  from  Park  (2003a): 

Lemma  2   (Park  (2003a),  Lemma  3.5(a)) 
Ifr>8,  then 
(a) 

-j^  Y^  sj  =  w{l)  +  n-'/^M{V)  +  n-'^^N{V)  +  Op{n-'/^), 

where  V  —  V{1)- 

About  convergence  of  stochastic  integrals: 

Lemma  3  (Kurtz  and  Protter)  For  each  n,  let  (X„,y„)  be  an  !F^-  adapted  process 
with  sample  paths  in  Skorokhod  space  D  and  let  Yn  be  JF"  semimartingale.  Suppose 
that  Yn  =  Mn  +  An  +  Zn,  whcre  Mn  is  a  local  T^ martingale,  An  is  J-^  adapted  finite 
variation  process  and  Zn  is  constant  except  for  finitely  many  discontinuities.  Let  Nn{t) 
denotes  the  number  of  discontinuities  of  process  Zn  on  interval  [0,  t] .  Suppose  that 
Nn  is  stochastically  bounded  for  each  t  >  0.  Suppose  that  for  each  a  >  0  there  exist 
stopping  times  {r"}  such  that  P{r^  <  a}  <  l/ce  and  sup„£'[[M„]jAra  +Tt/\T^{An)  < 
oo. 

If  [Xn,  Yn,  Zn)  —^'^  {X,  Y,  Z)  in  the  Skorokhod  topology,  then  Y  is  a  semimartingale 
with  respect  to  a  filtration  to  which  X  and  Y  are  adapted  and  {Xn,Yn,  J  XndYn)  — ^'^ 
{X,Y,  J  XdY)  in  the  Skorokhod  topology.  If  {Xn,Yn,  Zn)  -^  {X,Y,Z)  in  probability, 
then  convergence  in  probability  holds  in  the  conclusion. 
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Proof  of  Theorem  1 

(a) 

k 
V^  .7=1 


.c^  _  pC(T„,fc-T„,j) 


j=l 


)  {w  {Tn,j)  -  W  (Tnj-i))  + 


/  ■'■  n, 


+ 


=    I         e^e^/"-^)  J,(5)dF(s)  +  ^i,„  +  R2,n  +  R3,n, 

n  Jo 


where  we  have  the  following  lines  of  reasoning: 


j=i 


V"  j=l 
fe  fe 

-- ^  5Z  ^'^  IZ  (14(i)  _  y„(^  _  1))  (^  (T„,,.)  _  u,  (T„,,_i))  +  ^i,„  = 
^"^  i=i  ^=.+1 

k      i-l 
V'^  i=l  j=l 

=  -- ^  V  e'^'^  (K(i)  -  K(i  -  1))  y,_i  +  i?i,T  = 


c 


fkjn 


X    _   gCCr^.k-Tn,. 


n 


l.-'n.fe         -'■nj, 


\W 


{Tn,j)  ~w{Tn,j-l)\   < 
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k  /  7  •  \  2 


<  ^  c^e'^^  i (T„,fc  -  T„j)  j    \w  (r„j)  -  w  (T„,j_i)|  =  i?i,„ 

i?2,„  =  -^  E  ^'"^  ^^n{i)  -  K^(^  -  1))  y^-l  '^    f     ^  e'^''/"~''>  Ms)dV  (s) 

V'lT'  ~^  y/n  Jo 

''      rTn,j 


(b) 


where  we  used  statement  (a).   In  the  next  theorem  we  would  need  to  estimate  the 
distributional  impact  of  the  Op  term,  so,  I  keep  track  of  them 

^  V  V  ^  J  \/n      ^/n   Jo   Jo 

n 

V^f^^Jo  y/n       VnJo   Jo 


So 

fl     rt 

n 


^  tr;  ,._i  v"     v"  7o  Jo 


fc=i  fc=i 

Now 

y2jc{Tn,k-l)^-  I     J{s)dwS^    f   ^'^  Ms)dw{s)-f2    f   "■'    {Jc{s)-MTn,k-l))dw{s) 

tt  ^  Jo  Ji  t^iK.-. 


By  definition  of  0-U  process  J{s)  —  w{s)  +  cj^  J{t)dt: 

n         „T    1.  «         /-T 


n  »Xn  it 

+^Z1  /         (5(s)  -  B(7;,fe-i))du;(s) 

fe=l  JTn,k-l 


w{s)  -  w{Tn,k-i))dw{s)+ 
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where  j5(s)  =  /;  Jc{t)dt.  Let  Re,n  -  ELi  ff^^tJBis)  -  B{Tn,k-i))dw{s).  By  defi- 
nition /j,"'*"    {w{s)  -  w{Tn,k-i))dw{s)  —  ^+    "''''2"'""  '  ^^  ^  result 


"  nT     I.  1 


fc=l  •''^n,k-l 


2^^ 


Now  the  last.  We  know  that  d{J^{x))  =  2Jcdw  +  2cJ^dx  +  &,  so 

/Tn,n  1  pTnn  1 

J,(5)d«;(5)  =  -  (J,2(7;„)  -  J,2(l))  -  J        [cJlix)  +  -)dx  = 

1 .       ,...,2        1   .-    /    .0,.-       1 


Je(l)  {Jc{Tnn)  ~  Jc{l))  +  '^  (MTnn)  -  MW  -  -^V  ■   lcJ^{l)  +  -]+  R^,r^ 


=  Je(l)  (U;(r„„)  -  u;(l))  +  I  {wiTnn)  '  w{l)f  -  -^V  +  Rs,n  = 

z  2y/n 

=  n-i/V,(l)M(F)  +  n-1/2  ('jc(l)iV(V^)  +  ^M2(1/)  -  ^v\  +  Rs,n  +  Op{n-'/'). 

Here  the  last  equality  is  due  to  statement  (a)  of  Lemma  2.   The  definition  of  error 
terms  is 

/Tnn 
{Jl{x)  -  Jl{l))dx- 

/Tnn 
( Jc(x)  -  Jc{l))dx  -  i?7,n  +  cJc{l)R%,n- 

As  a  result, 

-y^yk-iek=   I  Jc{x)dw{x)+n-"U,{l)M{V)+ 

+-^  (-cj  j  e'^'-'U,{s)dV{s)dw{t)  +  Jc{l)N{V)  +  '^M\V)  +  ^t/)  +  Op(-^) 
(c)  Using  the  statement  of  part  (a) 

=  iE  (-;!/" "'"""-"^.W^V-M  +  o,  (-1;))  (27,(T„.)  +0,  (-L))  = 

=  -^  /   Jc{x)  r  e'^''-'^Ms)dV{s)dx  +  R9n  +  0pin-'^"-), 
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where 

1        l^ 


here 


1     /  1    "  /"^ 

i?9,„  =  -^     -  V  Bi{k/n)  -  /    B^{x)di 

^Vt[  Jo 

Bi(x)  =  -cJ,(x)  /  e'^(^-^)Je(5)dy(5). 
Jo 


As  a  result, 

'2        •"  —  2c     /-i 


Now, 


i  Y,  Jc(Tn,k-l)  -  j^     Jl{x)dx  =  J2  JciTn,k-l)  (^  "  (Tnk  -  Tn,k-l)^  - 

-E  /  "^   {J^it)-JUTn,k-i))dt+   f  ^^  Jl{t)dt. 
Let  us  consider  each  term  separately: 


where 


i?io,n  =  --^  (E  Jl{Tr,,k-i){Vn{kln)  -  14(fc  -  1/n))  -  y  J2(x)dy(x)^  . 
-Y.I  ^'    (Jc  (^)  -  Jl{TnM-i))dt  =  i?ii,„  =  Op(n-i/2^ 

Ji  v^ 


Summing  up: 


^E?^'^  fjci^)dx-^  f  Ux)  r  e'^^^-^^Us)dV[s)dx- 
J^    —  Jo  v"-  Jo  Jo 


VnJo 


1      ,o...,,  ,    1 


(d)  Using  the  statement  of  part  (a) 
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c      '■'    '•^ 


Jo    Jo 


As  a  result, 

fl       i-x 


n 
Now, 


"^  V^  Jo   Jo 


^  J2  Jo{Tn,k-l)  -  j    Jc{x)dx  =  Y,  Jc{Tn,k-l)  Q  -  {Tnk  "  T„,fc_i)  j  - 

-  5]    /   "'     (-^^(^^  -  Jc{Tn,k-l))dt  +    /   ""  Je(t)di 
•^T^,fc-i  Jl 

Let  me  consider  each  term  separately 

J2  Jc{Tn,k-l)   Q  -  {Tnk  -  T„,fc-l)  j    =  --^   /     ^c(x)ciF(x)  +  i?i2,„, 


where 


^i2,n  =  --^  f  5^  Je(r„,fe_i)(K(Vn)  -  Vn{k  -  1/n))  -   /"  J,{x)dV{x)\  . 

"  Zl    /    "'     ^-^^(^^  "  Jc{Tn,k-l))dt  =  i?13.„. 

Jc{t)dt  =  Mi)v  +  /?8,„. 
Summary 

-^  [  J^{x)dV{x)  +  ^J,{l)V  +  Op{^) 

Part  (e)  follows  from  (a)-(d)  and  Taylor  expansion. 

Now  we  need  to  check  that  for  all  i  we  have  Ri^n  =  Op{n~^/'^).  Statements  for 
-^2,71,  ■Rs.ni-Rio.n  and  i?i2,n  follows  from  Lemma  3  on  convergence  of  stochastic  inte- 
grals. Prom  convergence  of  non-stochastic  integrals  we  have  i?9,„  =  Op{n~^/'^). 

Terms  i?ii,„  and  i?i3,„  have  a  structure  of  Yllz=i  Cfc.ni  where  ^k,n  are  i.i.d.  across  k 
and  distributionally  equal  to  ^i^k  ~  Jq  C{t)dt  for  dC{t)  =  Ci{t)dt  +  C2{t)dw{t)  with 
Ci  G  Li,C2  G  1/2.  Then  E^i^n  =  ^ilo      Jo  CiC^t^O  ^  consi  •  n~^,  and 

rr/n      nt  pu 

E^l„  ^E{  E        cl{s)dsdudt)  <  const  ■  n'^. 

Jo       Jo       Jo 
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As  a  result,  both  terms  are  Op(l).  We  can  also  notice  that  Chebyshev's  inequality 
imphes  that  they  are  distributionally  o(n~^/^). 

Terms  i?3_„  and  iJg.n  also  have  a  structure  of  ^^^^  ^k,n,  where  ^k,n  are  i.i.d.  across 
A;.  Here  ^i,„  ~  J^  C{t)dw{t)  with  dC{i)  =  Cidt.  It  is  easy  to  see  that  £'^i,„  =  0  and 
E^l^  <  const  ■  n~^.  It  implies  that  terms  are  probabilistically  and  distributionally 
o(n-i/2). 

Terms  i?7_„  and  i?8,n  have  form  of  Jj  ""^ {D{s)  —  D{l))ds  where  dD{s)  =  didw+d2dt. 
It's  easy  to  see  that  they  are  Op{n~^^'^).  D 

Lemma  4  (Park(2003a))   If  r  >  4  then  we  might  choose  B„  and  B  such  that 

P  I  sup  \Bn{t)  -  B{t)\  >c\  <  n^-'"/4C-"/2(l  +  a-')K{l  +  E\ej\') 

lo<i<l  J 

/or  any  C  >  n-i/2+2A 

Proof  of  Theorem  2.  We  need  to  check  that  all  terms  R^^n  used  in  the  proof  of 
Theorem  1  is  distributionally  of  order  o(n~^/^). 

In  the  proof  of  Theorem  1  we  aheady  showed  that  terms  -Rs^n,  R6,m  Rn,n  and 
Ri3^n  are  distributionally  o(n~-'/^). 

Terms  i?2,n,  -Rs.m  -^lo.n  and  Ri2,n  have  a  form  of  stochastic  integrals  -^  J^  ^{t)d{V{t)- 
Vn{t))  or  -^  Jg  ^{t)d{w{t)  —  Wnit))-  Their  distributional  order  would  depend  on  the 
quadratic  variations  which  have  forms  of  supo<j<i  \Vn{t)  —  V{t)\-  and  supo<j<;^  \wn{t)  — 
w{t)\'^.  The  order  of  the  last  expressions  is  determined  by  Lemma  4. 

Terms  i?7,„  and/?8,n  have  form  of  |  /j^""(i:)(s)-£)(l))ds|  <  supo<t<i  \D{t)\-\Tnn-l\ 
which  is  distributionally  o"^/^. 

Proof  of  Lemma  1. 

First,  I  show  that  for  Skorokhod's  construction  presented  in  Skorokhod's  book 
(1965)  we  have  Et^  =  §£^^1 

Let  Tafi  is  the  smallest  root  of  the  equation  {w{t)  —  a){w{t)  —  6)  =  0.  Then 


„     w  sinh  6v2A  —  sinh  a  V  2A  ,^, 

£^g-Ara,f,   ^   ^  (5) 

sinh(6  —  a)v2A 


and 

dX'' 


(-1)'=— ^e-^-.^^^  =  £;<,.  (6) 
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In  the  construction  from  Skorokhod's  book  (1965)  the  stopping  time  is  defined  as 

r  =  iniiiwit)  -  e){w{t)  -  G{£))}  =  t,,g(s), 

where  s  is  independent  of  w  and  the  function  G  is  defined  by  JZ.  ydF{y)  =  0,  F{x)  = 
P{e  <  x}.  Then  'w{t)  has  the  same  distribution  as  e. 
We  can  notice  that 

The  last  could  be  calculated  using  equation  (6)  for  moments  of  Ta^  and  the  explicit 
formula  for  the  characteristic  function  (5).  We  also  use  the  following  two  facts: 
G{G{x))  =  X  and  G{x)dF{G{x))  =  xdF{x).  By  tedious  but  straightforward  calcula- 
tions one  can  obtain  the  formula  Et^  —  \E^^. 

Since  Ee^  <  oo  by  using  Chebyshev's  inequality  one  can  get  the  statement  of  the 
lemma. 

Proof  of  Theorem  3.  Theorem  2  states  the  distributional  expansions  for  the 
t-statistic  and  its  grid  bootstrapped  analog 

sup  \P{t{y,  n,  p)  <  x}  -  Gn{x)\  =  oirT^I''),  (7) 

X 

here  Gn{x)  =  P{t'^  +  :^^  f  +  A^g} ,  where  /  and  g  are  functional  of  Brownian  motions 
B{-).  The  covariance  structure  of  B  is  described  in  (3),  it  depends  on  cr^,  jj.^,  ^4,  k,  c. 
Brownian  motion  M  is  independent  of  B.  It  could  be  seen  from  the  proof  of  Theorem 
2  that  the  term  o{n~^^^)  in  equation  (7)  is  bounded  by  a  constant  depending  on  the 
eights  moment  of  the  approximated  error  term  fxg  times  n~^/^~''  for  some  5  >  0. 

For  almost  any  realization  of  an  infinite  sequence  of  error  terms  (ei, ...,  e„, ...)  and 
its  finite  subsequence  of  the  length  n  we  would  have 

sup  \P{t{y*,n,p)  <x}-  G;(x)|  <  Const{Jls)n-'/^-\ 

X 

here  G*  (x)  =  Plf^  +  ^r??/*  +  ;^5*}  where  /*  and  g*  are  the  same  functionals  of 
B*,M*.  The  covariance  structure  of  B*  depends  on  ct^,/X3,/24,k  Since  r  >  8,  then 
Const{Ji^)n~^/'^~^  =  o{n~^/'^)  a.s.  As  a  result, 

sup\P{t{y*,n,p)<x}-Glix)\  =  o{n-'/^)     a.s.,      '  (8) 

X 
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Combining  equations  (7)  and  (8)  with  Lemma  1  one  obtains  the  statement  of 
Theorem  3. 
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